Multi-document Summarization Using Support Vector Regression

نویسندگان

  • Sujian Li
  • You Ouyang
  • Wei Wang
  • Bin Sun
چکیده

Most multi-document summarization systems follow the extractive framework based on various features. While more and more sophisticated features are designed, the reasonable combination of features becomes a challenge. Usually the features are combined by a linear function whose weights are tuned manually. In this task, Support Vector Regression (SVR) model is used for automatically combining the features and scoring the sentences. Two important problems are inevitably involved. The first one is how to acquire the training data. Several automatic generation methods are introduced based on the standard reference summaries generated by human. Another indispensable problem in SVR application is feature selection, where various features will be picked out and combined into different feature sets to be tested. With the aid of DUC 2005 and 2006 data sets, comprehensive experiments are conducted with consideration of various SVR kernels and feature sets. Then the trained SVR model is used in the main task of DUC 2007 to get the extractive summaries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building a Trainable Multi-document Summarizer

This paper describes an approach to building a trainable multi-document summarization system, using a simple training process based on support vector machines. The summarization system is trained and tested using the DUC 2005 data set. The evaluation results based on ROUGE scores are presented and methods for improving the performance of the summarization system are identified.

متن کامل

Extractive Multi-Document Summarization with Integer Linear Programming and Support Vector Regression

We present a new method to generate extractive multi-document summaries. The method uses Integer Linear Programming to jointly maximize the importance of the sentences it includes in the summary and their diversity, without exceeding a maximum allowed summary length. To obtain an importance score for each sentence, it uses a Support Vector Regression model trained on human-authored summaries, w...

متن کامل

TGSum: Build Tweet Guided Multi-Document Summarization Dataset

The development of summarization research has been significantly hampered by the costly acquisition of reference summaries. This paper proposes an effective way to automatically collect large scales of news-related multi-document summaries with reference to social media’s reactions. We utilize two types of social labels in tweets, i.e., hashtags and hyper-links. Hashtags are used to cluster doc...

متن کامل

A SVM-Based Ensemble Approach to Multi-Document Summarization

In this paper, we present a Support Vector Machine (SVM) based ensemble approach to combat the extractive multi-document summarization problem. Although SVM can have a good generalization ability, it may experience a performance degradation through wrong classifications. We use a committee of several SVMs, i.e. Cross-Validation Committees (CVC), to form an ensemble of classifiers where the stra...

متن کامل

Automatic Annotation Techniques for Supervised and Semi-supervised Query-focused Summarization

In this paper, we study one semi-supervised and several supervised methods for extractive query-focused multi-document summarization. Traditional approaches to multidocument summarization are either unsupervised or supervised. The unsupervised approaches use heuristic rules to select the most important sentences, which are hard to generalize. On the other hand, huge amount of annotated data is ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007